Predicting renal function recovery and short-term reversibility among acute kidney injury patients in the ICU: comparison of machine learning methods and conventional regression

Abstract Background Acute kidney injury (AKI) is one of the most frequent complications of critical illness. We aimed to explore the predictors of renal function recovery and the short-term reversibility after AKI by comparing logistic regression with four machine learning models. Methods We reviewed patients who were diagnosed with AKI in the MIMIC-IV database between 2008 and 2019. Recovery from AKI within 72 h of the initiating event was typically recognized as the short-term reversal of AKI. Conventional logistic regression and four different machine algorithms (XGBoost algorithm model, Bayesian networks [BNs], random forest [RF] model, and support vector machine [SVM] model) were used to develop and validate prediction models. The performance measures were compared through the area under the receiver operating characteristic curve (AU-ROC), calibration curves, and 10-fold cross-validation. Results A total of 12,321 critically ill adult AKI patients were included in our analysis cohort. The renal function recovery rate after AKI was 67.9%. The maximum and minimum serum creatinine (SCr) within 24 h of AKI diagnosis, the minimum SCr within 24 and 12 h, and antibiotics usage duration were independently associated with renal function recovery after AKI. Among the 8364 recovered patients, the maximum SCr within 24 h of AKI diagnosis, the minimum Glasgow Coma Scale (GCS) score, the maximum blood urea nitrogen (BUN) within 24 h, vasopressin and vancomycin usage, and the maximum lactate within 24 h were the top six predictors for short-term reversibility of AKI. The RF model presented the best performance for predicting both renal functional recovery (AU-ROC [0.8295 ± 0.01]) and early recovery (AU-ROC [0.7683 ± 0.03]) compared with the conventional logistic regression model. Conclusions The maximum SCr within 24 h of AKI diagnosis was a common independent predictor of renal function recovery and the short-term reversibility of AKI. The RF machine learning algorithms showed a superior ability to predict the prognosis of AKI patients in the ICU compared with the traditional regression models. These models may prove to be clinically helpful and can assist clinicians in providing timely interventions, potentially leading to improved prognoses.


Introduction
Acute kidney injury (AKI) is one of the most common diseases with an incidence of 10-15% in inpatients [1]; in contrast, its morbidity can be as high as 50-60% in critically ill populations [2]. Despite advances in healthcare, the development of AKI is still independently associated with increased health care costs, the length of hospital stay, in-hospital morbidity, and mortality [3][4][5]. Unsurprisingly, the time for renal function recovery notably reflects the outcomes. A 2016 study of nearly 17,000 patients demonstrated that the persistence of AKI versus a prompt recovery is associated with higher morbidity and mortality [6]. Therefore, the severity of AKI and its timely treatment make AKI a consummate candidate for the use of predictive analytics.
Some scholars have pointed out that age, comorbidities, baseline renal function, and proteinuria have been shown to predict the probability of AKI recovery [7,8]. In addition, Srisawat et al. constructed a prediction model, which found that the APACHE-II score and Charlson comorbidity index were vital predictors [9]. However, their study only included a small group of patients (n ¼ 76), which may reduce the accuracy of real-time implementation. Overall, the current research showed the present limitations in predicting whether the individual patient with AKI will recover and the recovery time.
Currently, to the best of our knowledge, there are only a few clinical studies that have included a significant number of patients and compared machine learning models to conventional regression models to predict renal function recovery and short-term reversibility after an episode of AKI. Therefore, it is hypothesized that our prognostic model will be accurate enough to recognize renal function recovery earlier among these vulnerable populations, so as to improve the prognosis of these patients by allowing for the increased opportunity to assist patients in the recovery from AKI and allowing for the prevention of further renal insults in the setting of an evolving injury, which may eventually lead to chronic kidney disease (CKD).

Sources of data
This retrospective study was conducted by collecting data from an extensive critical care database named the Multiparameter Intelligent Monitoring in Intensive Care Database IV (MIMIC IV), which included all laboratory, medical test results, the pharmaceutical, and diagnostic codes for more than 40,000 ICU patients treated at Beth Israel Deaconess Medical Center (Boston, MA) from 2008 to 2019 [10]. To apply for access to the database, we completed the National Institutes of Health's web-based course and successfully passed the Protecting Human Research Participants exam (No. 9936285). This study was approved by the institutional review board of Peking University People's Hospital (Beijing Municipal Science, 7222199) and followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis and Diagnosis (TRIPOD) reporting guidelines.

Selecting an AKI cohort
According to the Kidney Disease Improving Global Outcomes (KDIGO) clinical practice guidelines, we initially screened all adult patients who met the criteria for AKI within 48 h after ICU admission [11]. For exclusion criteria, patients who were discharged or died within 48 h after ICU admission and those who stayed in the ICU for more than 90 d were excluded. In addition, we excluded the cohort of participants who died with renal function recovery. The additional exclusion criteria included a history of receiving long-term renal replacement treatment (RRT), a diagnosis of advanced CKD, no data regarding the creatinine values after the AKI diagnosis, and when the patient's maximum serum creatinine (SCr) was smaller than the baseline SCr.

Data collection and definition
Data were abstracted from MIMIC IV using the Structured Query Language (SQL) with Navicat Premium (version 12.0.28). We obtained the demographic and clinical data within the first 24 h after ICU admission and the diagnosis of AKI. The comorbidities and diagnoses were identified based on the ICD-9 codes. The scoring systems included the Glasgow Coma Scale (GCS) score, Simplified Acute Physiology Score II (SAPS II), and Sequential Organ Failure Assessment (SOFA) score. The patient's vital signs, including systolic blood pressure (SBP), heart rate (HR), peripheral oxygen saturation (SPO 2 ), and temperature (T), were extracted. Furthermore, the patient's laboratory data, including hemoglobin, leucocytes, basophils, monocytes, platelet count, lactate, albumin, anion gap, blood urea nitrogen (BUN), chloride, base excess, prothrombin time, and bicarbonate, were also recorded. In addition, the variables associated with AKI diagnosis were also abstracted, such as the urine volume and urine volume to weight ratio. Data regarding any therapy, such as vasopressors, antibiotics, furosemide, nephrotoxic drugs, mechanical ventilation, and RRT were also collected. Because of the high sampling frequency, we used the minimum, maximum, and mean values when extracting the vital signs and laboratory data.
In this study, we recognized that patients with renal function recovery no longer fulfilled the criteria for stage 1 AKI, but that their SCr levels might not have yet returned to baseline (defined as the return to <30% above the baseline) before being discharged from the ICU [12,13]. An alternative definition of non-recovery was the presence of meeting the AKI criteria or dying during the ICU stay. In addition, the AKI start time was the first time that the patient met the KDIGO criteria. The AKI recovery time was measured as the AKI recovery time minus the start time. We defined renal function recovery from AKI within 72 h of the initiating event typically recognized as the short-term reversal of AKI.

Statistical analysis
The patients were divided into two groups based on whether they achieved renal function recovery, and the variables were displayed and compared between the groups. The demographics and other characteristics were summarized using means and standard deviations, medians, interquartile ranges, or frequency counts and percentages. The chi-squared test was used to compare the categorical variables, and the Mann-Whitney U test was used to compare the discrete distributions. The continuous variables were tested by the independent t-test. All of the data were analyzed using Python version 3.8 and R 4.0.5 (The R Foundation for Statistical Computing, Vienna, Austria) statistical software, with the statistical significance set at a p value <.05.

Development and validation prediction model
We randomly separated the model development data into two parts: we used 90% of the data for the model derivation and 10% of the data for the internal validation. We developed a conventional logistic regression, XGBoost algorithm model, Bayesian networks (BNs), random forest model (RF), and support vector machine model (SVM) based on the training dataset, and verified these models in the validation dataset to identify the optimal predictors.
In the conventional method, each risk factor was used in the univariate analysis, and then a multivariate analysis was conducted to build the best fit logistic regression model. XGBoost is based on the sparsityaware algorithm and is a weighted quantile sketch, in which the weak learners can be converged sequentially into the ensemble to achieve a strong learner [14]. The BN is a graphical representation, where each node corresponds to the random variables, and each edge represents the corresponding random variables' conditional probability [15]. RF is a learning method that unifies the results of multiple decision trees that are constructed based on the bootstrap sampling of the training dataset and randomly selects properties in each tree as a subset of the entire set of predictors [16]. SVM is an optimal classification algorithm in highdimensional space to distinguish between different categories of samples, with the ability to transform training data into a high-dimensional feature space and make a linear optimal solution by separating a hyperplane that engages the smallest distance between the hyperplane points and the largest margin between the classes [17].
Evaluating the performance of the models To assess the model quality, we chose the area under the receiver operating characteristic curve (AU-ROC) as the measurement to compare the performances of the logistic regression and the machine learning algorithm models. In addition, we employed 10-fold cross-validation, which provides a more stable and reliable way to measure the performances of models. To further assess the models' performances, a plot of the percentage of observations above a probability threshold versus the percentage of observations was constructed; then, an evaluation of the secondary metrics of the clinical prediction models, including the accuracy, sensitivity, specificity, precision, and recall, was performed [18].

Results
The demographics, clinical characteristics, and AKI metric measurements In total, 44,486 patients met the KDIGO criteria within the first 48 h after ICU admission. After excluding the patients according to the exclusion criteria, the final analysis cohort consisted of 29,931 eligible patients ( Figure 1). In the analytic cohort, the average age of the patients was 66.7 years old; male patients accounted for 56.0% (n ¼ 6901) of the cohort; white patients accounted for 67.9% (n ¼ 8366) of the cohort, and the minimum GCS score was 13. Of these, the cohort was divided into two groups: the AKI recovery group (n ¼ 8364, 67.9%) and the AKI non-recovery group (n ¼ 3957, 32.1%). Of the recruited patients, 9460 patients were in stage 1 (6534 AKI recovery versus 2926 AKI non-recovery), 2634 were in stage 2 (1716 AKI recovery versus 918 AKI non-recovery), and 227 were in stage 3 (114 AKI recovery versus 113 AKI non-recovery). The baseline demographics, clinical characteristics, interventions, and outcomes are outlined in Table 1.

The features selected and model comparison in renal function recovery
The results of the logistic regression analysis were outlined in Table 2 By comparing the performance of four different machine learning models, RF presented the best prediction value. According to the analysis results of each feature's contribution by the RF model, the maximum and minimum SCr within 24 h from the diagnosis of AKI, the minimum Scr within 24 h, the minimum Scr within 12 h, and antibiotic duration were the top five essential predictors for predicting renal function recovery (Figure 2).
A total of 846 (10%) patients were included in the model validation phase. The discrimination was appraised using an AU-ROC ( Figure 3) and calibration curves ( Figure  4) during the model development and validation phases. The RF model showed significantly better discrimination than the traditional logistic regression model, with a higher and more narrowed 95% confidence interval (AU-ROC, 0.8597; 95% CI 0.84-0.88 versus 0.8143; 95% CI 0.78-0.83) (Figure 3). Table 3 describes the model performance measures for the five models in identifying AKI recovery and non-recovery status. When considering the sensitivity and precision to predict an independent testing set, the RF model performed with a more balanced result than logistic regression.

The model establishment and comparison of the short-term reversibility of AKI
The 8364 recovered patients were randomly split into a training and validation cohort consisting of 7619 (90%) and 846 (10%) recovered patients. Logistic regression revealed that the minimum GCS, urine volume to weight ratio within 24 h, the maximum SCr within 24 h of AKI diagnosis, the maximum BUN within 24 h, the maximum lactate within 24 h, the minimum anion gap, antibiotic duration, vancomycin, vasopressin, phenylephrine, furosemide, and ventilation within 24 h of AKI diagnosis were significantly associated with renal function recovery (Table 4).
We also built four machine learning models, which showed that the RF model illustrated the highest predictive performance. The maximum SCr within 24 h of AKI diagnosis, the minimum GCS, the maximum BUN within 24 h, vasopressin, the maximum lactate within 24 h, and vancomycin demonstrated notable associations with short-term renal function recovery ( Figure 5). Nevertheless, for the best predictive outcomes among machine algorithms, RF was slightly better than traditional logistic regression (AU-ROC, 0.7683 ± 0.03 versus 0.7669 ± 0.03). Table 5 compares the models' performances using 10-fold cross-validation.

Discussion
The early identification of high-risk AKI populations may assist in adapting treatment in a way to avoid further renal function deterioration. Additionally, the detection of those who lack short-term reversibility may allow the determination of the optimal timing of the RRT treatment strategy for this later. By comparing conventional logistic regression with four different machine learning algorithms, we developed and tested applicable models for ICU AKI patients to help in assessing the probability of renal function recovery and predicting the short-term reversibility of AKI.
Only a few studies have predicted the prognosis of AKI with effective predictions to support the decisionmaking. Several clinical tools, including prediction models [19,20], urinary indices [21,22], novel biomarkers [23,24], and imaging techniques [25], have been evaluated in previous studies to predict renal recovery, namely, the progression to severe AKI.
In our model, the SCr within 24 h of AKI diagnosis might provide a significant indication for the possibility of both renal function recovery and short-term reversibility. Consistently with our findings, a recent study that enrolled 8320 critical patients with AKI pointed out the ability of the SCr for predicting persistent AKI, with the AUC of 0.74 (95% CI 0.71-0.77) [19].
The GCS score has been widely adopted as an instrument for assessing clinical severity and predicting outcomes after brain injury [26]. Moore and coworkers presented an incidence of AKI in 9% of traumatic brain injury patients with GCS scores less than 13 [27], while Zacharia et al. revealed an incidence of AKI in 23% of   patients with aneurysmal subarachnoid hemorrhage [28]. In our research, those with higher GCS scores were more likely to present renal function recovery within 72 h. Therefore, it may need to highlight the indispensability of early recognition of renal risk and prompt clinicians to practice renal treatment for patients with low GCS scores. And a recent study reported that some medications can simultaneously protect the brain and kidney by inhibiting the inflammatory processes caused by brain trauma, to mitigate the incidence of AKI in neurotrauma [29]. Unsurprisingly, vancomycin showed an independent association with renal function injury [30]. It has been reported that the mechanism of vancomycin-associated AKI is the development of acute interstitial nephritis [31,32]. Our research also revealed that patients treated with vancomycin notably demonstrated poor shortterm reversibility of AKI.
Positive fluid balance increased the risk for adverse outcomes and increased mortality from the vasopressin versus norepinephrine treatment in patients with septic shock [33], and conservative versus liberal fluid management strategies in acute lung injury remedies [34]. However, patients with renal function impairment were more prone to positive fluid accumulation, resulting from deterioration of kidney adjustment of water balance [35]. Bouchard et al. reported that fluid overload was associated with non-recovery of renal function in critically ill patients with AKI [36]. Similarly, in our shortterm recovery model, vasopressin demonstrated harm in renal function reversibility.
Rising lactate levels revealed the insufficient perfusion, oxygen supply, and metabolism of tissues [37]. A previous study by Yan et al. pointed out that patients with poor baseline renal function have higher levels of lactate. This was consistent with our result that lactate was an indicator of kidney function, especially in predicting early AKI recovery [38].
In this study, the machine learning algorithm achieved better predictive outcomes than the conventional logistic regression, especially in predicting renal function recovery. Our study demonstrated that the SCr examination within 24 h of AKI diagnosis might provide a significant indication for the possibility of renal function recovery and recovery time. Interestingly, the GCS score not only assists clinicians in evaluating neurocognitive impairment in ICU patients but also in predicting renal function impairment duration. Furthermore, the usage of vancomycin and vasopressin were strong predictors of the short-term irreversibility of renal function.
Although this study explored the predicted model for renal function recovery with beneficial performance, it is acknowledged that there were several limitations in this study. For this large national cohort, we only validated the models with an internal dataset. In addition, novel biomarkers, which were potential predictors of  renal function recovery but not routinely detected in clinical settings, were not included in our prediction models. Therefore, a multicenter prospective study should be established in the future to prove the predictive effect of the factors found in our study.

Conclusion
In this large-cohort retrospective study, by comparing a conventional regression model with four machine learning algorithms, we developed two RF models to predict renal function recovery and short-term reversibility with high practicability and interpretability. The maximum SCr within 24 h of AKI diagnosis, the minimum GCS, vasopressin, and vancomycin were revealed to be notably associated with short-term reversibility of renal function. Consequently, predicting the recovery time of AKI may assist in assessing the likelihood of a patient needing RRT and, ultimately, could assist in determining the suitable timing for the initiation of RRT.